Adaptive Batching of Streams to Enhance Throughput and to Support Dynamic Load Balancing
نویسنده
چکیده
As data permeates all disciplines, the role of big data becomes increasingly important. Sensors, IoT devices, social networks, and online transactions are all generating data that can be monitored constantly to enable a business to identify opportunity to enhance customer service and increase revenue. This need for real-time processing of big data has led to the development of frameworks for distributed stream processing in clusters. It is important for such frameworks to be resilient against variable operating conditions such as server load variation, changes in data ingestion rates, and workload characteristics. In this thesis, we explore the effects of the batch size on the performance of streaming workloads by developing an adaptive batching framework and building load-balancing algorithms on top of this framework. We explore the idea of using a combination of adaptive batching of tuples and dynamic tuple dispatching to improve the throughput and load-distribution of the workload. We show through experiments that the system is able to be resilient and robust under varying operating conditions.
منابع مشابه
Enhancement of Power System Voltage Stability Using New Centralized Adaptive Load Shedding Method
This paper presents a new centralized adaptive method under frequency load shedding. Sometimes, after initial frequency drop following severe disturbances, although the system frequency returns to its permissible value, however, the system might become unstable due to voltage problems. In this regard, the paper proposes a new centralized adaptive load shedding method to enhance the voltage stab...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملAdaptive Fault-Tolerance for Dynamic Resource Provisioning in Distributed Stream Processing Systems
A growing number of applications require continuous processing of high-throughput data streams, e.g., financial analysis, network traffic monitoring, or Big Data analytics for smart cities. Stream processing applications typically require specific quality-of-service levels to achieve their goals; yet, due to the high time-variability of stream characteristics, it is often inefficient to statica...
متن کاملLoad Balancing Approaches for Web Servers: A Survey of Recent Trends
Numerous works has been done for load balancing of web servers in grid environment. Reason behinds popularity of grid environment is to allow accessing distributed resources which are located at remote locations. For effective utilization, load must be balanced among all resources. Importance of load balancing is discussed by distinguishing the system between without load balancing and with loa...
متن کاملA Review of Ad Hoc On-demand Distance Vector Routing Protocol for Mobile Ad Hoc Networks
A mobile ad hoc network is networks which utilizes multi-hop radio relaying and are capable of operating without the support of any fixed infrastructure. Efficient dynamic routing is a challenge in such a network. On-demand routing protocol is widely developed in ad hoc networks because of its effectiveness and efficiency. In this paper, the significance of Ad hoc On-Demand Distance Vector (AOD...
متن کامل